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Abstract —Face detection is an essential step in many computer 
vision applications like surveillance, tracking, medical analysis, 
facial expression analysis etc. Several approaches have been made 
in the direction of face detection. Among them, Haar-like features 
based method is a robust method. In spite of the robustness, 
Haar - like features work with some limitations. However, with 
some simple modifications in the algorithm, its performance can 
be made faster and more robust. The present work refers to 
the increase in speed of operation of the original algorithm by 
down sampling the frames and its analysis with different scale 
factors. It also discusses the detection of tilted faces using an 
affine transformation of the input image. 
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I. Introduction 

Object detection such as face and eye detection is an 
important step in many vision based applications which may 
include video surveillance, tracking, medical analysis, facial 
expression analysis etc. Many researchers 0 0^ ©-GD 
have proposed different methods for detection of face. Among 
these methods, the use of Haar-like features 0 , ®, m-m 
is found to be quite robust. However, the processing rate is 
found to be slow for applications requiring high frame rates of 
processing. The classifier based on Haar like features detects 
frontal faces accurately. Some applications require detection 
of faces even with small tilts. Several approaches p2|-fl6| 
have been made to detect tilted faces. This paper discusses 
the improvements on the Haar-like features based method and 
their analysis. There are three basic contributions in this paper. 
Firstly, it analyses the change of processing speed with the 
down sampling scale factors (SF). Secondly, it analyses the 
change of accuracy of detection with different SFs. Finally, 
an improvement of the algorithm is done by performing an 
affine transformation fl6| on the input image to detect tilted 
faces. The combined algorithm is implemented in a single 
board computer. This paper is organized as follow: Section 2 
gives an introduction to face detection using Haar-like features 
and its modifications done to improve the robustness of the 
algorithm. Section 3 analyses the speed versus scale factor. 
In section 4, the accuracy versus scale factor is analysed. 
Section 5 discusses the results obtained in sections 3 and 4. 
Section 6 deals with the detection of tilted face using an affine 
transformation. Section 7 concludes the paper. 


II. The Algorithm 

A. Haar-like Features 

Haar-like features 0 ® are certain features in a digital 
image which are used in object detection. They are named 
so on account of their similarity with Haar wavelets. The 
Haar-like features are used in real-time object detection. The 
algorithm was first developed by Viola et. al. 0 ,® and was 
later extended by Lienhart et. al. G3- The algorithm [1] was 
found to achieve a 95% accuracy rate for the detection of a 
human face using only 200 simple features. 

B. Speeding up Operation 

In the original algorithm, the full resolution picture is 
examined to detect the face. Integral images [1| are used 
to calculate features rapidly in multi-resolutions. Once the 
integral images are computed, any one of these Haar-like 
features can be obtained at any scale or location in constant 
time. From this, it is obvious that after the integral image has 
been calculated, the time needed for Haar-like feature based 
detection is constant. So in order to speed up the detection 
time, the time taken for the calculation of the integral images 
must be minimized. The time taken, to calculate the integral 
image, increases with the size of the image. The image is down 
sampled to decrease the time taken. The number of pixels in 
the new image is now reduced. This improves the speed of 
detection of face. In order to detect the eyes, a region of interest 
(ROI) is selected from the face region detected in the down 
sampled image. The ROI must be at the same resolution as the 
image captured from camera to detect the eyes with maximum 
accuracy. This is achieved by remapping ROI co-ordinates to 
original image. 

C. Remapping ROI Co-ordinates to Original Image 

The original image is stored before it is down sampled. 
The coordinates of the ROI in the down sampled images are 
obtained, once the face detection is over. These are remapped 
on to the original image to obtain the ROI to detection of eyes. 
This method neither alters the detection rate of eyes nor the 
accuracy as compared to the original algorithm and at the same 
time, improves the processing speed up to 10 fps from 3 fps. 
Fig. [T] describes the scheme of remapping of ROI. 

III. Speed and Scale Factor Analysis 


The full version of the method described in this paper is available in : 
Dasgupta, Anirban, Anjith George, S. L. Happy, and Aurobinda Routray. ”A 
Vision-Based System for Monitoring the Loss of Attention in Automotive 
Drivers.” Intelligent Transportation Systems, IEEE Transactions on 14, no. 4 
(2013): 1825-1838. 


A. Experiment Design 

We define the SF as follows: 


SF = 


no. of vertical pixels in original frame 
no. of vertical pixels in downsampled frame 
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Fig. 1: ROI remapping 



or 

no • °/ horizontal pixels in original frame 
no. of horizontal pixels in downsampled frame 

For the analysis of speed versus SF, an experiment is con¬ 
ducted. Six subjects are chosen and videos of facial and non 
facial images are recorded under laboratory conditions and 
stored at 30 fps at a resolution of 640 x 480 pixels in .avi 
format. The incoming frame is down sampled by SFs of 2, 4, 6, 
8 and 10. The processing speed for each video is noted down. 
The processing is done in a computer having specifications of 
Intel dual core processor, of speed 2.00 GHz, 2 GB of DDR2 
RAM. Fig. [2] shows some sample images extracted from the 
videos. 


Fig. 3: Graph showing SF vs Speed versus 



B. Results 


Fig. 4: Some detection results 


Face and eyes are detected as the algorithm is run on the 
videos. Fig. 4] shows some face detection results. The average 
speed of all the subjects for each SF is plotted in Matlab 
against the different SFs. Table |T| shows the readings of speed 
data obtained for each subject at different SFs. Fig. [3] shows 
the plot of average speed versus SF. 


Fig. 2: Some sample test dataset 


IV. Accuracy and Scale Factor Analysis 
A. Experiment Design 

The accuracy of detection is analyzed by plotting ROC 
curves and then calculating the area under the curve (AUC). 
The same videos which were used for speed analysis are used 
for the analysis of accuracy vs. SF. First of all the frames 
are extracted as jpg image file using Free Video to JPEG 
Converter. Then, the images are manually marked to take into 
account the presence of face and eyes. A Matlab Graphical 
User Interface (GUI) is made to store the ground truth. Then 
the program for face and eye detection is run and the detection 
results are stored in another Excel sheet. The detection results 
are then compared with the ground truth to obtain the number 
of true positives (tp), false positives (fp), true negatives (tn) and 
false negatives (fn). The true positive rate (tpr) is calculated 
using 


tp + fn 



( 3 ) 
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Subl 

Sub2 

Sub3 

Sub4 

Sub5 

Sub6 

Average 

1 

2.45 

2.13 

2.33 

2.43 

2.49 

2.6 

2.405 

2 

9.78 

9.09 

7.61 

6.26 

8.69 

7.1 

8.088 

4 

17.64 

23.75 

20.78 

21.96 

21.16 

23.58 

21.478 

6 

28.69 

28.39 

31.46 

29.1 

29.84 

31.43 

29.818 

8 

33.26 

30.63 

35.27 

31.45 

35.27 

32.26 

33.023 

10 

40.14 

37.54 

43.11 

32.33 

43.11 

39.55 

39.297 

12 

46.56 

46.56 

46.56 

43.11 

46.56 

51.08 

46.738 


TABLE I: Processing speed versus SF 


The false positive rate (fpr) is calculated using 


fpr = 


fp 

fp + tn 


(4) 


B. Results 

The plot of tpr vs fpr is called the ROC curve fl7| . The 
ROC curve for each SF is obtained. 

After obtaining the following observations, the graph of AUC 
versus SF is plotted using in Matlab. The accuracy ED of the 
classifier can be calculated as in equation [4] 

, tp + tn 

Accuracy = - (5) 

p + n 

Where p is the number of positive images and n the number 
of negative images. 


SF 

Area Under 
the Curve (AUC) 

1 

0.8460 

2 

0.8461 

4 

0.8493 

6 

0.8274 

8 

0.5790 

10 

0.1723 

12 

0.0833 


TABLE II: AUC versus SF 



Subl 

Sub2 

Sub3 

Sub4 

Sub5 

Sub6 

1 

0.9936 

0.9914 

0.9986 

0.9929 

0.8186 

0.9236 

2 

0.9979 

0.9929 

0.9986 

0.9993 

0.8021 

0.9993 

4 

0.9957 

1 

0.9907 

0.9993 

0.7671 

1 

6 

0.9943 

0.9943 

0.9229 

0.9907 

0.7579 

1 

8 

0.9700 

0.9129 

0.4621 

0.9008 

0.4871 

0.9793 

10 

0.5336 

0.2579 

0.2029 

0.3669 

0.2700 

0.3414 

12 

0.2500 

0 

0.2029 

0.2684 

0 

0.2179 



fpr 


Fig. 5: ROC comparisons for different SFs 



Fig. 6: SF vs Accuracy 


to calculate the integral images. Once the integral images 
are computed, the time needed for Haar-like feature based 
detection at any scale and location is constant. Further, the 
number of sub windows to be searched is also reducing which 
in turn improves the speed even more. Hence, such an increase 
in speed is observed with increase in down sampling SF. The 
accuracy versus SF shows that the accuracy and AUC remains 
almost constant upto SF of 6 and then droops down up to an 
SF of 10 and saturates henceforth. 


TABLE III: Accuracy versus SF 


V. Discussions 

The speed versus SF analysis reveals the fact that the speed 
of operation increases non-linearly with SF. This observation 
can be explained as follows. With down sampling of images, 
the number of pixels to be operated on reduces by a factor 
which is equal to the SF. This reduces the time needed 


VI. Tilted face detection using an affine 

TRANFORMATION 

The original Haar cascade technique applied for face detec¬ 
tion detects frontal faces only. If there is a moderate amount 
of tilt of face, it will not be detected. Consequently eyes will 
also be not detected in such frames. In some applications, tilted 
face detection is a desired condition and hence an approach for 
tilted face detection is a must. Several approaches ED-ED 
were made for such a purpose. We have adopted an affine 
transformation based method for the detection of tilted (in 
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Fig. 7: SF vs AUC 


plane rotated) images. The rotation matrix can be found for 
an n dimensional image once its size, center and angle of 
rotation needed are known. This is appended along with down 
sampling to make a robust face detection algorithm. Fig. [8] 
shows tilted face detection using an affine transformation. A 
detailed description of this algorithm can be found in our 
earlier works (18), fT9|. The algorithms hasbeen tested in 
KGP-NIR fac database go). 



Fig. 8: Tilted face detection using an affine transformation 


VII. Conclusion 

This paper gives an analysis of the speed versus SF and 
accuracy versus SF. It is observed from the experiment and 
results that by down sampling up to an SF of 6, there is 
appreciable amount of increase in speed without much loss 
of accuracy which improves its real time performance. The 
method described in this paper can be used in applications to 


detect eyes along with face. By combining the use of affine 
transformation to detect tilted faces, the algorithm is made 
more robust with excellent real-time performance. 
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